Estimating the unseen: A sublinear-sample canonical estimator of distributions

نویسندگان

Gregory Valiant

Paul Valiant

چکیده

We introduce a new approach to characterizing the unobserved portion of a distribution, whichprovides sublinear-sample additive estimators for a class of properties that includes entropy anddistribution support size. Together with the lower bounds proven in the companion paper [29],this settles the longstanding question of the sample complexities of these estimation problems (upto constant factors). Our algorithm estimates these properties up to an arbitrarily small additiveconstant, using O(n/ log n) samples; [29] shows that no algorithm on o(n/ log n) samples canachieve this (where n is a bound on the support size, or in the case of estimating the supportsize, 1/n is a lower bound the probability of any element of the domain). Previously, no explicitsublinear-sample algorithms for either of these problems were known.Additionally, our algorithm runs in time linear in the number of samples used. Think not, because no man sees,Such things will remain unseen.–Henry Wadsworth Longellow, from “The Builders”.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the Unseen: Improved Estimators for Entropy and other Properties

Recently, Valiant and Valiant [1, 2] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most n distinct elem...

متن کامل

Minimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function

The problem of estimating the parameter ?, when it is restricted to an interval of the form , in a class of discrete distributions, including Binomial Negative Binomial discrete Weibull and etc., is considered. We give necessary and sufficient conditions for which the Bayes estimator of with respect to a two points boundary supported prior is minimax under squared log error loss function....

متن کامل

Estimating a Bounded Normal Mean Relative to Squared Error Loss Function

Let be a random sample from a normal distribution with unknown mean and known variance The usual estimator of the mean, i.e., sample mean is the maximum likelihood estimator which under squared error loss function is minimax and admissible estimator. In many practical situations, is known in advance to lie in an interval, say for some In this case, the maximum likelihood estimator...

متن کامل

INSPECTRE: Privately Estimating the Unseen

We develop differentially private methods for estimating various distributional properties. Given a sample from a discrete distribution p, some functional f , and accuracy and privacy parameters α and ε, the goal is to estimate f(p) up to accuracy α, while maintaining ε-differential privacy of the sample. We prove almost-tight bounds on the sample size required for this problem for several func...

متن کامل

Comparison of Small Area Estimation Methods for Estimating Unemployment Rate

Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula tion census is a challenge for SCI in using these methods. In general, the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Electronic Colloquium on Computational Complexity (ECCC)

دوره 17 شماره

صفحات -

تاریخ انتشار 2010

Estimating the unseen: A sublinear-sample canonical estimator of distributions

نویسندگان

چکیده

منابع مشابه

Estimating the Unseen: Improved Estimators for Entropy and other Properties

Minimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function

Estimating a Bounded Normal Mean Relative to Squared Error Loss Function

INSPECTRE: Privately Estimating the Unseen

Comparison of Small Area Estimation Methods for Estimating Unemployment Rate

عنوان ژورنال:

اشتراک گذاری